Abstract: Cloud storage systems are able to provide low-cost and convenient network storage service for users, which makes them more and more popular. However, the storage pressure on cloud storage system caused by the explosive growth of data is growing by the day, especially a vast amount of data waste plenty of storage space. Data deduplication can effectively reduce the size of data by eliminating redundant data in storage systems. However, current researches on data deduplication, which mainly focus on the static scenes such as the backup and archive systems, are not suitable for cloud storage system due to the dynamic nature of data. In this paper, we propose the architecture of deduplication system for cloud storage environment and give the process of avoiding duplication at the file-level and chunk-level on the client side. In the storage nodes (Snodes), DelayDedupe, a delayed target deduplication scheme based on the chunk-level deduplication and the access frequency of chunks, are proposed to reduce the response time. Combined with replica management, this method determines whether new duplicated chunks for data modification are hot and removes the hot duplicated chucks when they aren?t hot. The experiment results demonstrate that the DelayDedupe mechanism can effectively reduce the response time and achieve the storage load of Snodes more balanced.

Keywords: Cloud storage, deduplication, DelayDedupe, replica, chunk, load balancing.